453 research outputs found
Microbial metal resistance and metabolism across dynamic landscapes: high-throughput environmental microbiology.
Multidimensional gradients of inorganic compounds influence microbial activity in diverse pristine and anthropogenically perturbed environments. Here, we suggest that high-throughput cultivation and genetics can be systematically applied to generate quantitative models linking gene function, microbial community activity, and geochemical parameters. Metal resistance determinants represent a uniquely universal set of parameters around which to study and evaluate microbial fitness because they represent a record of the environment in which all microbial life evolved. By cultivating microbial isolates and enrichments in laboratory gradients of inorganic ions, we can generate quantitative predictions of limits on microbial range in the environment, obtain more accurate gene annotations, and identify useful strategies for predicting and engineering the trajectory of natural ecosystems
Using Agent-Based Modelling to Investigate Intervention Algorithms to Reduce Polarisation in Online Social Networks
Across much of the western world, political polarisation is on the rise. This has the effect
of hindering political discourse, stifling open discussion, and in extreme cases has led to
violence. The process of polarising and radicalising vulnerable individuals has migrated to
social media websites, which have been implicated in several high profile terror attacks.
Within this thesis we model and investigate various algorithms to prevent the spread
of polarisation and extremist ideology by employing agent-based modelling techniques
from the field of opinion dynamics. The contributions of our work include the following
aspects.
Firstly, we have developed a unified framework for opinion dynamics, allowing us
to experiment easily on a number of different existing models and bringing together
sometimes disparate innovations from across the field into one system.
Secondly, this unified framework has been implemented in a modular simulator able
to perfectly replicate results from purpose-built, stand-alone simulators for two widely
used models, namely Relative Agreement and CODA, and then released to the public as
the first general-purpose opinion dynamics simulator.
Thirdly, we have developed two new intervention algorithms, along with a new metric
for measuring the effectiveness of an intervention strategy, which aim to reduce the
spread of polarisation across a network with low computational cost. These methods are
compared to existing centrality-based methods upon a random network. The experimental
results show our proposed approaches outperform centrality measures. We find that our
ii
iii
algorithms are able to prevent up to 40% of non-extremist agents becoming extreme by
removing only 10% of the network’s edges.
Fourthly, we have investigated the efficacy of these intervention algorithms on polarisation under different scenarios (e.g. variable costs, different network structures).
The experimental validation proves the proposed approach is robust and has performed
favourably compared existing methods such as centrality-based methods especially on
the second type of network.
Finally, we have developed a broadcast-based communication system for agents,
designed to mimic the one-way broadcast nature of a public social media post such as
Twitter, in contrast to the existing model which emulates a two-way private conversation. The experimental result shows a lessening of the impact of our interventions,
demonstrating the need for further investigation of such communication methods
Cold Fusion: Training Seq2Seq Models Together with Language Models
Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks
which involve generating natural language sentences such as machine
translation, image captioning and speech recognition. Performance has further
been improved by leveraging unlabeled data, often in the form of a language
model. In this work, we present the Cold Fusion method, which leverages a
pre-trained language model during training, and show its effectiveness on the
speech recognition task. We show that Seq2Seq models with Cold Fusion are able
to better utilize language information enjoying i) faster convergence and
better generalization, and ii) almost complete transfer to a new domain while
using less than 10% of the labeled training data
Forest Management in Coastal Pine Forests: An Investigation of Prescribed Fire Behavior, Detrital Chemical Composition, and Potential Water Quality Impacts
Prescribed fire, thinning, and mastication are common forest management practices implemented in southern pine forests. These practices affect ecosystem properties differently depending upon the intensity at which they are implemented. One ecosystem property of interest is the chemical composition of forest detritus, commonly referred to as the litter and duff. This material is largely responsible for the replenishment of organic resources into soils. It may also be a primary contributor to surface water quality. In this study we were given an opportunity to evaluate two long-term forest management strategies at two sites along the South Carolina coastal plain to determine their effects on forest detrital chemical composition and potential water quality: 1) frequent prescribed fire (annual and biennial) and 2) a combination of periodic prescribed fire (every 3-4 years) and singular implementations of tree thinning and understory mastication. Based upon our analyses, we confirmed that the prescribed fires implemented on these sites display the characteristics of low intensity, low severity surface fires. As such, fuel quantities decreased as a result of forest management at both sites. At one of our sites, the Tom Yawkey Wildlife Center in Georgetown, South Carolina, the chemical functional groups of forest detritus were not greatly altered by fire. Specific compounds within these groups may have been affected by fire, but returned to or fell below long-term unburned levels within one-year post-fire. On our other site, the Santee Experimental Forest, it appears that long-term forest management has altered overstory species composition and subsequently detrital chemical composition. At both sites, potential organic pollutants were reduced by the forest management practices. This reduction may be beneficial in terms of water treatment and human health. These results add to the long list of benefits noted in the literature for active forest management, particularly the benefits of prescribed fire
Scaling Deep Learning on GPU and Knights Landing clusters
The speed of deep neural networks training has become a big bottleneck of
deep learning research and development. For example, training GoogleNet by
ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training
process, the current deep learning systems heavily rely on the hardware
accelerators. However, these accelerators have limited on-chip memory compared
with CPUs. To handle large datasets, they need to fetch data from either CPU
memory or remote processors. We use both self-hosted Intel Knights Landing
(KNL) clusters and multi-GPU clusters as our target platforms. From an
algorithm aspect, current distributed machine learning systems are mainly
designed for cloud systems. These methods are asynchronous because of the slow
network and high fault-tolerance requirement on cloud systems. We focus on
Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original
EASGD used round-robin method for communication and updating. The communication
is ordered by the machine rank ID, which is inefficient on HPC clusters.
First, we redesign four efficient algorithms for HPC systems to improve
EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD
are faster \textcolor{black}{than} their existing counterparts (Async SGD,
Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design
Sync EASGD, which ties for the best performance among all the methods while
being deterministic. In addition to the algorithmic improvements, we use some
system-algorithm codesign techniques to scale up the algorithms. By reducing
the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x
speedup over original EASGD on the same platform. We get 91.5% weak scaling
efficiency on 4253 KNL cores, which is higher than the state-of-the-art
implementation
Training Big Random Forests with Little Resources
Without access to large compute clusters, building random forests on large
datasets is still a challenging problem. This is, in particular, the case if
fully-grown trees are desired. We propose a simple yet effective framework that
allows to efficiently construct ensembles of huge trees for hundreds of
millions or even billions of training instances using a cheap desktop computer
with commodity hardware. The basic idea is to consider a multi-level
construction scheme, which builds top trees for small random subsets of the
available data and which subsequently distributes all training instances to the
top trees' leaves for further processing. While being conceptually simple, the
overall efficiency crucially depends on the particular implementation of the
different phases. The practical merits of our approach are demonstrated using
dense datasets with hundreds of millions of training instances.Comment: 9 pages, 9 Figure
- …